On the Performance Bounds of some Policy Search Dynamic Programming Algorithms
نویسنده
چکیده
We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI) scheme via an -approximate greedy operator (Kakade and Langford, 2002; Lazaric et al., 2010). We describe existing and a few new performance bounds for Direct Policy Iteration (DPI) (Lagoudakis and Parr, 2003; Fern et al., 2006; Lazaric et al., 2010) and Conservative Policy Iteration (CPI) (Kakade and Langford, 2002). By paying a particular attention to the concentrability constants involved in such guarantees, we notably argue that the guarantee of CPI is much better than that of DPI, but this comes at the cost of a relative—exponential in 1 — increase of time complexity. We then describe an algorithm, Non-Stationary Direct Policy Iteration (NSDPI), that can either be seen as 1) a variation of Policy Search by Dynamic Programming by Bagnell et al. (2003) to the infinite horizon situation or 2) a simplified version of the Non-Stationary PI with growing period of Scherrer and Lesner (2012). We provide an analysis of this algorithm, that shows in particular that it enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a time complexity similar to that of DPI.
منابع مشابه
Efficient Algorithms for Just-In-Time Scheduling on a Batch Processing Machine
Just-in-time scheduling problem on a single batch processing machine is investigated in this research. Batch processing machines can process more than one job simultaneously and are widely used in semi-conductor industries. Due to the requirements of just-in-time strategy, minimization of total earliness and tardiness penalties is considered as the criterion. It is an acceptable criterion for b...
متن کاملA Framework for Adapting Population-Based and Heuristic Algorithms for Dynamic Optimization Problems
In this paper, a general framework was presented to boost heuristic optimization algorithms based on swarm intelligence from static to dynamic environments. Regarding the problems of dynamic optimization as opposed to static environments, evaluation function or constraints change in the time and hence place of optimization. The subject matter of the framework is based on the variability of the ...
متن کاملModeling and scheduling no-idle hybrid flow shop problems
Although several papers have studied no-idle scheduling problems, they all focus on flow shops, assuming one processor at each working stage. But, companies commonly extend to hybrid flow shops by duplicating machines in parallel in stages. This paper considers the problem of scheduling no-idle hybrid flow shops. A mixed integer linear programming model is first developed to mathematically form...
متن کاملA New Mathematical Model for a Multi-product Supply Chain Network with a Preventive Maintenance Policy
The supply chain network design (SCND) implicates decision-making at a strategic level and makes it possible to create an effective and helpful context for managing. The aim of the network is to minimize the total cost so that customer's demands should be met. Preventive maintenance is pre-determined work performed to a schedule with the aim of preventing the wear and tear or sudden failure of ...
متن کاملMathematical Programming Models for Solving Unequal-Sized Facilities Layout Problems - a Generic Search Method
This paper present unequal-sized facilities layout solutions generated by a genetic search program named LADEGA (Layout Design using a Genetic Algorithm). The generalized quadratic assignment problem requiring pre-determined distance and material flow matrices as the input data and the continuous plane model employing a dynamic distance measure and a material flow matrix are discussed. Computa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1306.0539 شماره
صفحات -
تاریخ انتشار 2013